Nesterov’s accelerated gradient (NAG) is a momentum-based optimizer. It attempts to mitigate vanilla SGD with momentum’s tendency to overshoot the optimum.
NAG proceeds in two steps. First, it computes a set of “look-ahead parameters”
The momentum term is based on the change in these look-ahead parameters:
In other words, the momentum reflects the rate of change in parameters if they had continued to descend as they did at time